This notebook will be showing the exploratory data analysis on the vancouver tree dataset, obatined from here. The data is a subset of the original data from the City of Vancouver website were generated randomly so they may or may not be representative samples of the original data set. So, the data analysis done here may not give us the full picture of what we are trying to find out. The data were obtained from The city of Vancouver's Open Data Portal and follows an Open Government Licence – Vancouver.
The tagline for the province of British Columbia is "Beautiful British Columbia". The west coast of British Columbia, including Vancouver, has a moderate climate year-round, Which makes it a very good tourist destination year around. The Spring and Fall are especially colorful in Vancouver. The following questions are of interest for the EDA,
# Importing the Libraries needed for the Analysis
import pandas as pd
import altair as alt
alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')
Here the vancouver_trees.csv dataset is read and stored in an object named tree_data_all. The date_planned column is changed to datetime dtype using the parse_dates method. Since we are interested in finding the trees on the streets of the city, we would be filtering the dataset for trees on the curb alone.
tree_data_all = pd.read_csv('vancouver_trees.csv',parse_dates=['date_planted'])
tree_data_full = tree_data_all[tree_data_all['curb'] == 'Y']
tree_data_all.head()
| std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | civic_number | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | W 13TH AV | MAPLE ST | PSEUDOPLATANUS | Kitsilano | NaT | 9.00 | EVEN | ACER | N | 1996 | 10 | Y | 13310 | SYCAMORE MAPLE | 4 | 2900 | NaN | N | 49.259856 | -123.150586 |
| 1 | WALES ST | WALES ST | PLATANOIDES | Renfrew-Collingwood | 2018-11-28 | 3.00 | ODD | ACER | N | 5291 | 7 | Y | 259084 | PRINCETON GOLD MAPLE | 1 | 5200 | PRINCETON GOLD | N | 49.236650 | -123.051831 |
| 2 | W BROADWAY | W BROADWAY | RUBRUM | Kitsilano | 1996-04-19 | 14.00 | EVEN | ACER | N | 3618 | C | Y | 167986 | KARPICK RED MAPLE | 3 | 3600 | KARPICK | N | 49.264250 | -123.184020 |
| 3 | PENTICTON ST | PENTICTON ST | CALLERYANA | Renfrew-Collingwood | 2006-03-06 | 3.75 | EVEN | PYRUS | N | 2502 | 5 | Y | 213386 | CHANTICLEER PEAR | 1 | 2500 | CHANTICLEER | Y | 49.261036 | -123.052921 |
| 4 | RHODES ST | RHODES ST | GLYPTOSTROBOIDES | Renfrew-Collingwood | 2001-11-01 | 3.00 | ODD | METASEQUOIA | N | 5639 | N | Y | 189223 | DAWN REDWOOD | 2 | 5600 | NaN | N | 49.233354 | -123.050249 |
tree_data_all.describe()
| diameter | civic_number | tree_id | height_range_id | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| count | 30000.000000 | 30000.000000 | 30000.000000 | 30000.000000 | 30000.000000 | 30000.000000 | 30000.000000 |
| mean | 12.204926 | 2974.397233 | 128252.088500 | 2.717967 | 2948.410767 | 49.247574 | -123.106173 |
| std | 9.334234 | 2068.223585 | 75099.139102 | 1.555819 | 2069.837551 | 0.021178 | 0.049450 |
| min | 0.000000 | 0.000000 | 29.000000 | 0.000000 | 0.000000 | 49.200732 | -123.223870 |
| 25% | 4.250000 | 1319.000000 | 62110.250000 | 2.000000 | 1300.000000 | 49.230519 | -123.144596 |
| 50% | 10.000000 | 2646.000000 | 129058.000000 | 2.000000 | 2600.000000 | 49.248200 | -123.104022 |
| 75% | 17.250000 | 4063.250000 | 190963.750000 | 4.000000 | 4100.000000 | 49.263808 | -123.062734 |
| max | 317.000000 | 9201.000000 | 271025.000000 | 10.000000 | 9200.000000 | 49.294528 | -123.018258 |
maximum height range of the tree is 10 and the mean height range is 2.7. the max diameter of 317 we got may be an outlier, so we should drop that value as it can affect the overall calculations.
tree_data_full.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 27454 entries, 0 to 29999 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 std_street 27454 non-null object 1 on_street 27454 non-null object 2 species_name 27454 non-null object 3 neighbourhood_name 27454 non-null object 4 date_planted 13033 non-null datetime64[ns] 5 diameter 27454 non-null float64 6 street_side_name 27454 non-null object 7 genus_name 27454 non-null object 8 assigned 27454 non-null object 9 civic_number 27454 non-null int64 10 plant_area 27250 non-null object 11 curb 27454 non-null object 12 tree_id 27454 non-null int64 13 common_name 27454 non-null object 14 height_range_id 27454 non-null int64 15 on_street_block 27454 non-null int64 16 cultivar_name 15082 non-null object 17 root_barrier 27454 non-null object 18 latitude 27454 non-null float64 19 longitude 27454 non-null float64 dtypes: datetime64[ns](1), float64(3), int64(4), object(12) memory usage: 4.4+ MB
Now we will filter the dataset for the required columns alone as shown
tree_data_full = tree_data_full[['on_street','neighbourhood_name','common_name','genus_name','on_street_block','latitude','longitude']]
To answer the first question of interest the dataset needed to be filtered for the genuses of fall and spring. The genuses that lose leafs in autumn and that bloom in the spring are found out manually.
List of genuses for spring are stored in a list named list_flowering. Similarly, list of genuses foe fall are stored in a list named list_decidous
list_flowering = ['AMELANCHIER','CASTANEA','CATALPA','CERCIS','CHITALPA','CLADRASTIS','CORNUS','CRATAEGUS',
'DAVIDIA','KOELREUTERIA','LABURNUM','MAGNOLIA','MALUS','MANGLIETIA','MESPILUS','PAULOWNIA','PRUNUS',
'PYRUS','ROBINIA','SALIX','SOPHORA','STEWARTIA','STYRAX','SYRINGA']
list_decidous = ['ACER','AESCULUS','BETULA', 'CARPINUS', 'CASTANEA','CATALPA','CELTIS', 'CERCIDIPHYLLUM','CERCIS','CLADRASTIS','CORNUS','CORYLUS','CRATAEGUS',
'DAVIDIA','EUCOMMIA','EUONYMUS','FAGUS','FRAXINUS','GINKGO','GLEDITSIA','GYMNOCLADUS','JUGLANS','KOELREUTERIA','LARIX','LIQUIDAMBAR',
'LIRIODENDRON','MAGNOLIA', 'MALUS','MANGLIETIA', 'MESPILUS', 'METASEQUOIA', 'NOTHOFAGUS', 'NYSSA','OSTRYIA','OXYDENDRUM','PARROTIA',
'PLATANUS', 'POPULUS', 'PRUNUS', 'PTELEA', 'PTEROCARYA','PYRUS','QUERCUS', 'RHAMNUS', 'RHUS','ROBINIA','SALIX','SOPHORA', 'SORBUS',
'STEWARTIA', 'STYRAX', 'SYRINGA','TILIA','ULMUS','ZELKOVA']
Next step is filtering the dataframe to get dataframes for fall and spring
tree_data_fall = tree_data_full[tree_data_full['genus_name'].isin(list_decidous)].reset_index(drop=True)
tree_data_fall.head()
| on_street | neighbourhood_name | common_name | genus_name | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | MAPLE ST | Kitsilano | SYCAMORE MAPLE | ACER | 2900 | 49.259856 | -123.150586 |
| 1 | WALES ST | Renfrew-Collingwood | PRINCETON GOLD MAPLE | ACER | 5200 | 49.236650 | -123.051831 |
| 2 | W BROADWAY | Kitsilano | KARPICK RED MAPLE | ACER | 3600 | 49.264250 | -123.184020 |
| 3 | PENTICTON ST | Renfrew-Collingwood | CHANTICLEER PEAR | PYRUS | 2500 | 49.261036 | -123.052921 |
| 4 | RHODES ST | Renfrew-Collingwood | DAWN REDWOOD | METASEQUOIA | 5600 | 49.233354 | -123.050249 |
tree_data_spring = tree_data_full[tree_data_full['genus_name'].isin(list_flowering)].reset_index(drop=True)
tree_data_spring.head()
| on_street | neighbourhood_name | common_name | genus_name | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | PENTICTON ST | Renfrew-Collingwood | CHANTICLEER PEAR | PYRUS | 2500 | 49.261036 | -123.052921 |
| 1 | E 53RD AV | Sunset | KWANZAN FLOWERING CHERRY | PRUNUS | 700 | 49.221900 | -123.087772 |
| 2 | FREMLIN ST | Oakridge | JAPANESE FLOWERING CRABAPPLE | MALUS | 6300 | 49.227886 | -123.126944 |
| 3 | W 16TH AV | Shaughnessy | PISSARD PLUM | PRUNUS | 1700 | 49.257081 | -123.144401 |
| 4 | E 5TH AV | Hastings-Sunrise | GOLDENRAIN TREE | KOELREUTERIA | 2800 | 49.265769 | -123.045915 |
The information for the fall and spring dataset is obtained as follows
tree_data_fall.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 26627 entries, 0 to 26626 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 on_street 26627 non-null object 1 neighbourhood_name 26627 non-null object 2 common_name 26627 non-null object 3 genus_name 26627 non-null object 4 on_street_block 26627 non-null int64 5 latitude 26627 non-null float64 6 longitude 26627 non-null float64 dtypes: float64(2), int64(1), object(4) memory usage: 1.4+ MB
tree_data_spring.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10029 entries, 0 to 10028 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 on_street 10029 non-null object 1 neighbourhood_name 10029 non-null object 2 common_name 10029 non-null object 3 genus_name 10029 non-null object 4 on_street_block 10029 non-null int64 5 latitude 10029 non-null float64 6 longitude 10029 non-null float64 dtypes: float64(2), int64(1), object(4) memory usage: 548.6+ KB
From the information it can be seen that the columns of interest to us does not contain any null values. Hence after using 'groupby' the grouped count can be found out using the size method.
By visualising the data in a map would be the best way to find out which neighbourhood best fall and spring colors. To obtain the map the dataset is grouped based on the neighbourhood data and the genus and coordinate values are aggregated as follows.
tree_data_spring_neigh = tree_data_spring.groupby('neighbourhood_name').agg(
{'genus_name':'count','latitude':'median','longitude':'median'}).reset_index()
tree_data_spring_neigh = tree_data_spring_neigh.assign(genus_count = tree_data_spring_neigh['genus_name'])
tree_data_spring_neigh = tree_data_spring_neigh.drop(columns='genus_name')
tree_data_spring_neigh.head()
| neighbourhood_name | latitude | longitude | genus_count | |
|---|---|---|---|---|
| 0 | Arbutus-Ridge | 49.250025 | -123.161904 | 422 |
| 1 | Downtown | 49.277961 | -123.122331 | 78 |
| 2 | Dunbar-Southlands | 49.243260 | -123.186695 | 564 |
| 3 | Fairview | 49.264512 | -123.131607 | 208 |
| 4 | Grandview-Woodland | 49.273117 | -123.063963 | 387 |
tree_data_fall_neigh = tree_data_fall.groupby('neighbourhood_name').agg(
{'genus_name':'count','latitude':'median','longitude':'median'}).reset_index()
tree_data_fall_neigh = tree_data_fall_neigh.assign(genus_count = tree_data_fall_neigh['genus_name'])
tree_data_fall_neigh = tree_data_fall_neigh.drop(columns = 'genus_name')
tree_data_fall_neigh.head()
| neighbourhood_name | latitude | longitude | genus_count | |
|---|---|---|---|---|
| 0 | Arbutus-Ridge | 49.248710 | -123.161757 | 903 |
| 1 | Downtown | 49.279966 | -123.119568 | 892 |
| 2 | Dunbar-Southlands | 49.245445 | -123.184780 | 1526 |
| 3 | Fairview | 49.263478 | -123.130377 | 719 |
| 4 | Grandview-Woodland | 49.272675 | -123.064132 | 1179 |
The maps are obtained for the spring and fall along with point plot giving neighbourhood and genus count.
url_geojson = 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'
data_geojson_remote = alt.Data(url=url_geojson, format=alt.DataFormat(property='features',type='json'))
data_geojson_remote
Data({
format: DataFormat({
property: 'features',
type: 'json'
}),
url: 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'
})
vancouver_map = alt.Chart(data_geojson_remote).mark_geoshape(
color = 'white', opacity= 0.5, stroke='black').encode(
).project(type='identity', reflectY=True)
neighbourhood_tree_plot_spring = alt.Chart(data_geojson_remote).mark_geoshape(
stroke = 'black',strokeWidth=0.15).encode(
color = alt.Color('genus_count:Q',scale = alt.Scale(scheme ='redpurple'),title = None, legend = None),
tooltip =[alt.Tooltip('neighbourhood_name:N',title ='Neighbourhood Name'),
alt.Tooltip('genus_count:Q',title='No of Genus')]
).transform_lookup(lookup = 'properties.name',from_ = alt.LookupData(
tree_data_spring_neigh,'neighbourhood_name',['neighbourhood_name','genus_count'])).project(
type = 'identity',reflectY = True).properties(title = 'Spring Genus Distribution Map')
vancouver_spring_map = vancouver_map+neighbourhood_tree_plot_spring
neighbourhood_tree_plot_fall = alt.Chart(data_geojson_remote).mark_geoshape(
stroke = 'black', strokeWidth = 0.15).encode(
color = alt.Color('genus_count:Q',scale = alt.Scale(scheme = 'yelloworangered'),title = None,legend=None),
tooltip =[alt.Tooltip('neighbourhood_name:N', title = 'Neighbourhood Name'),
alt.Tooltip('genus_count:Q',title = 'No of Genus')]
).transform_lookup(lookup = 'properties.name',from_ = alt.LookupData(
tree_data_fall_neigh,'neighbourhood_name',['neighbourhood_name','genus_count'])).project(
type = 'identity', reflectY =True).properties(title ='Fall Genus Distribution Map')
vancouver_fall_map = vancouver_map + neighbourhood_tree_plot_fall
map_all = alt.hconcat(vancouver_fall_map,vancouver_spring_map).resolve_scale(color ='independent')
map_all
genus_fall_plot = alt.Chart(tree_data_fall_neigh, title = 'Top 5 Neigbourhood with Most Fall Genuses').mark_bar(color ='orangered').encode(
alt.X('genus_count', title = None, axis= None),
alt.Y('neighbourhood_name',sort ='-x',title = None)).transform_window(
rank = 'rank(genus_count)',
sort = [alt.SortField('genus_count',order ='descending')]).transform_filter(
alt.datum.rank <= 5)
genus_fall_plot = genus_fall_plot + genus_fall_plot.mark_text(align ='left', dx=3).encode(text ='genus_count:Q', color = alt.value('black'))
genus_spring_plot = alt.Chart(tree_data_spring_neigh,title = 'Top 5 Neigbourhood with Most Spring Genuses').mark_bar(color = 'pink').encode(
alt.X('genus_count',title= None,axis=None),
alt.Y('neighbourhood_name',sort ='-x', title=None)).transform_window(
rank = 'rank(genus_count)',
sort = [alt.SortField('genus_count',order ='descending')]).transform_filter(
alt.datum.rank <= 5)
genus_spring_plot = genus_spring_plot + genus_spring_plot.mark_text(align ='left', dx=3).encode(text ='genus_count:Q', color = alt.value('black'))
genus_season_plot = genus_fall_plot | genus_spring_plot
genus_season_plot
The following code gives the top 5 neighbourhood names.
spring_most_genus_df = tree_data_spring.groupby('neighbourhood_name').agg(
{'genus_name':'count'}).reset_index().sort_values('genus_name',ascending=False).reset_index(drop=True).loc[0:5]
spring_most_genus_list = list(spring_most_genus_df['neighbourhood_name'])
fall_most_genus_df = tree_data_fall.groupby('neighbourhood_name').agg(
{'genus_name':'count'}).reset_index().sort_values('genus_name',ascending=False).reset_index(drop=True).loc[0:5]
fall_most_genus_list = list(fall_most_genus_df['neighbourhood_name'])
From the plots it can be seen the top 5 neighbourhoods with most fall genuses are 'Kensington-Cedar Cottage', 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Victoria-Fraserview', 'Dunbar-Southlands', 'Sunset', and similarly for spring the genuses are 'Renfrew-Collingwood', 'Hastings-Sunrise', 'Kensington-Cedar Cottage', 'Victoria-Fraserview', 'Sunset', 'Dunbar-Southlands'. If someone is visiting vancouver, I would recommend to visit these neighbourhood get the most of the vancouver colors!
Finding the most common genuses that cause the foliage and bloom is our next task. For that following bar chart is obtained between genus name and genus count, and the chart is sorted for the most common genuses is as shown below. Here a dropdown selection option is provided to get the most common genuses in each neighbourhood as well.
fall_genus_bar = fall_genus_bar = (alt.Chart(tree_data_fall).transform_aggregate(
groupby =['genus_name','neighbourhood_name'],
gen_count = 'count(genus_name)').mark_bar().encode(
x=alt.X('genus_name:N',sort='-y',title= 'Genus Name'),
y=alt.Y('gen_count:Q', title = 'Genus Count',stack=None),
color = alt.Color('neighbourhood_name:N', scale = alt.Scale(scheme = 'yelloworangered')),
tooltip =[alt.Tooltip('gen_count:Q',title='Genus Count')]).transform_window(
rank = 'rank(gen_count)',
sort = [alt.SortField('gen_count',order ='descending')]))
fall_genus_bar
list_neigh = sorted(list(tree_data_full['neighbourhood_name'].unique()))
dropdown_neigh = alt.binding_select(name = 'Neighbourhood',options = list_neigh)
select_neigh = alt.selection_single(fields = ['neighbourhood_name'],
bind =dropdown_neigh)
fall_genus_bar_select = fall_genus_bar.add_selection(select_neigh).encode(color =alt.value('orangered'),
opacity=alt.condition(select_neigh,alt.value(0.9),alt.value(0))).properties(width=1000,height=300)
fall_genus_bar_select
spring_genus_bar = (alt.Chart(tree_data_spring).transform_aggregate(
groupby =['genus_name','neighbourhood_name'],
gen_count = 'count(genus_name)').mark_bar().encode(
x=alt.X('genus_name:N',sort='-y',title= 'Genus Name'),
y=alt.Y('gen_count:Q', title = 'Genus Count' ),
color = alt.Color(value='pink'),
tooltip =[alt.Tooltip('gen_count:Q',title='Genus Count')]).transform_window(
rank = 'rank(gen_count)',
sort = [alt.SortField('gen_count',order ='descending')]))
spring_genus_bar
spring_genus_bar_select = spring_genus_bar.add_selection(select_neigh).encode(color =alt.value('pink'),
opacity=alt.condition(select_neigh,alt.value(0.9),alt.value(0))).properties(width=1000,height=300)
spring_genus_bar_select
title_genus_fall_plot = alt.TitleParams(text ='Most Common Tree Genuses for Fall')
genus_fall_plot = (alt.Chart(tree_data_fall,title = title_genus_fall_plot).transform_aggregate(
groupby =['genus_name'],
gen_count = 'count(genus_name)').mark_bar().encode(
y=alt.Y('genus_name:N',sort = '-x',title = None),
x=alt.X('gen_count:Q',axis = None),
color = alt.Color(value='orangered'),
tooltip =[alt.Tooltip('gen_count:Q',title='Genus Count')]).transform_window(
rank = 'rank(gen_count)',
sort = [alt.SortField('gen_count',order ='descending')]).transform_filter(
alt.datum.rank <= 10))
common_fall_plot = genus_fall_plot + genus_fall_plot.mark_text(align ='left', dx=3).encode(text ='gen_count:Q', color = alt.value('black'))
title_genus_spring_plot = alt.TitleParams(text ='Most Common Tree Genuses for Spring')
genus_spring_plot = (alt.Chart(tree_data_spring,title = title_genus_spring_plot).transform_aggregate(
groupby =['genus_name'],
gen_count = 'count(genus_name)').mark_bar().encode(
y=alt.Y('genus_name:N',sort='-x',title= None),
x=alt.X('gen_count:Q', title = 'Genus Count',axis = None),
color = alt.Color(value='pink'),
tooltip =[alt.Tooltip('gen_count:Q',title='Genus Count')]).transform_window(
rank = 'rank(gen_count)',
sort = [alt.SortField('gen_count',order ='descending')]).transform_filter(
alt.datum.rank <= 10))
common_spring_plot = genus_spring_plot + genus_spring_plot.mark_text(align ='left', dx=3).encode(text ='gen_count:Q', color = alt.value('black'))
common_genus_plot = common_fall_plot | common_spring_plot
common_genus_plot
for further analysis the most common genuses for fall and spring will be looked upon to find the streets with most colours. As we can see from the plot above, there is a considerably big difference in the counts of these genuses to the rest of them.
common_genus_df = tree_data_spring.groupby('genus_name').agg(
{'common_name':'count'}).reset_index().sort_values('common_name',ascending=False).reset_index(drop=True).loc[0:4]
common_genus_spring = list(common_genus_df['genus_name'])
common_genus_spring
['PRUNUS', 'MALUS', 'MAGNOLIA', 'CRATAEGUS', 'PYRUS']
common_genus_df = tree_data_fall.groupby('genus_name').agg(
{'common_name':'count'}).reset_index().sort_values('common_name',ascending=False).reset_index(drop=True).loc[0:9]
common_genus_fall = list(common_genus_df['genus_name'])
common_genus_fall
['ACER', 'PRUNUS', 'FRAXINUS', 'TILIA', 'CARPINUS', 'QUERCUS', 'FAGUS', 'MALUS', 'MAGNOLIA', 'CRATAEGUS']
For doing further analysis, it is needed to filter the spring and fall dataframes based on the common_genus_spring and common_genus_fall lists as follows..
tree_common_fall = tree_data_fall[tree_data_fall['genus_name'].isin(common_genus_fall)].reset_index(drop=True)
tree_common_fall.head()
| on_street | neighbourhood_name | common_name | genus_name | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | MAPLE ST | Kitsilano | SYCAMORE MAPLE | ACER | 2900 | 49.259856 | -123.150586 |
| 1 | WALES ST | Renfrew-Collingwood | PRINCETON GOLD MAPLE | ACER | 5200 | 49.236650 | -123.051831 |
| 2 | W BROADWAY | Kitsilano | KARPICK RED MAPLE | ACER | 3600 | 49.264250 | -123.184020 |
| 3 | E 53RD AV | Sunset | KWANZAN FLOWERING CHERRY | PRUNUS | 700 | 49.221900 | -123.087772 |
| 4 | SE MARINE DRIVE | Sunset | PYRAMIDAL EUROPEAN HORNBEAM | CARPINUS | 100 | 49.211478 | -123.102993 |
tree_common_spring = tree_data_spring[tree_data_spring['genus_name'].isin(common_genus_spring)].reset_index(drop=True)
tree_common_spring.head()
| on_street | neighbourhood_name | common_name | genus_name | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|
| 0 | PENTICTON ST | Renfrew-Collingwood | CHANTICLEER PEAR | PYRUS | 2500 | 49.261036 | -123.052921 |
| 1 | E 53RD AV | Sunset | KWANZAN FLOWERING CHERRY | PRUNUS | 700 | 49.221900 | -123.087772 |
| 2 | FREMLIN ST | Oakridge | JAPANESE FLOWERING CRABAPPLE | MALUS | 6300 | 49.227886 | -123.126944 |
| 3 | W 16TH AV | Shaughnessy | PISSARD PLUM | PRUNUS | 1700 | 49.257081 | -123.144401 |
| 4 | ELLIOTT ST | Victoria-Fraserview | REDBUD CRABAPPLE | MALUS | 6000 | 49.229089 | -123.054659 |
By grouping the data by neighbourhood name, genus name and on_street columns we get the streets in which we have the most color.
genus_street_f = tree_common_fall.groupby(['neighbourhood_name','genus_name','on_street']).size()
genus_street_fall = genus_street_f.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_fall_fil = genus_street_fall[genus_street_fall['genus_count']>=30]
genus_street_fall_fil.head()
| neighbourhood_name | genus_name | on_street | genus_count | |
|---|---|---|---|---|
| 0 | Kitsilano | ACER | W 6TH AV | 59 |
| 1 | Kitsilano | ACER | W 11TH AV | 54 |
| 2 | Kitsilano | ACER | W 15TH AV | 51 |
| 3 | Kensington-Cedar Cottage | ACER | KINGSWAY | 46 |
| 4 | Shaughnessy | ACER | ANGUS DRIVE | 45 |
genus_street_s = tree_common_spring.groupby(['neighbourhood_name','genus_name','on_street']).size()
genus_street_spring = genus_street_s.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_spring_fil = genus_street_spring[genus_street_spring['genus_count']>=25]
genus_street_spring_fil.head()
| neighbourhood_name | genus_name | on_street | genus_count | |
|---|---|---|---|---|
| 0 | Marpole | PRUNUS | W 59TH AV | 41 |
| 1 | Arbutus-Ridge | PRUNUS | W 22ND AV | 38 |
| 2 | Victoria-Fraserview | PRUNUS | DUMFRIES ST | 38 |
| 3 | Renfrew-Collingwood | PRUNUS | RUPERT ST | 35 |
| 4 | Kensington-Cedar Cottage | PRUNUS | DUMFRIES ST | 34 |
the heatmap for common genuses are obtained as follows
common_genus_heat_fall = alt.Chart(tree_common_fall).mark_rect().encode(
alt.X('neighbourhood_name',title=None),
alt.Y('genus_name',title=None),
alt.Color('count()', scale = alt.Scale(scheme = 'yelloworangered'),legend=None),
tooltip = 'count()')
common_genus_heat_spring = alt.Chart(tree_common_spring).mark_rect().encode(
alt.X('neighbourhood_name',title=None),
alt.Y('genus_name',title=None),
alt.Color('count()', scale = alt.Scale(scheme = 'redpurple'),legend=None),
tooltip = 'count()')
(common_genus_heat_fall | common_genus_heat_spring).resolve_scale(color = 'independent')
click = alt.selection_single(fields = ['genus_name'],bind = 'legend')
genus_street_spring_plot_bar = alt.Chart(genus_street_spring_fil, title= 'Spring Genuses on the Streets of Vancouver').mark_bar().encode(
alt.X('on_street',title =None,sort='-y'),
alt.Y('genus_count', title = None,stack=None),
alt.Color('genus_name',scale = alt.Scale(scheme='redpurple'),title = 'Genus Name'),
opacity = alt.condition(click,alt.value(0.9),alt.value(0)),
tooltip = alt.Tooltip('genus_count')).add_selection(click)
genus_street_spring_plot_bar
genus_street_spring_plot_facet = alt.Chart(genus_street_spring_fil, title= 'Spring Genuses on the Streets of Vancouver').mark_bar().encode(
alt.X('on_street',title =None,sort='-y'),
alt.Y('genus_count', title = None,stack=None),
alt.Color('genus_name',scale = alt.Scale(scheme='redpurple'),title = 'Genus Name'),
tooltip = alt.Tooltip('genus_count')).properties(height=300,width=400).facet('genus_name',columns=1)
genus_street_spring_plot_facet
genus_street_fall_plot_bar = alt.Chart(genus_street_fall_fil, title= 'Fall Genuses on the Streets of Vancouver').mark_bar().encode(
alt.X('on_street',title =None,sort='-y'),
alt.Y('genus_count', title = None,stack=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'reds'), title ='Genus Name'),
opacity = alt.condition(click,alt.value(0.9),alt.value(0)),
tooltip = alt.Tooltip('genus_count')).add_selection(click)
genus_street_fall_plot_bar
genus_street_fall_plot_facet = alt.Chart(genus_street_fall_fil, title= 'Fall Genuses on the Streets of Vancouver').mark_bar().encode(
alt.X('on_street',title =None,sort='-y'),
alt.Y('genus_count', title = None,stack=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'reds'), title ='Genus Name'),
tooltip = alt.Tooltip('genus_count')).properties(height=300,width=400).facet('genus_name',columns=2)
genus_street_fall_plot_facet
genus_street_fall_plot = alt.Chart(genus_street_fall_fil, title= 'Fall Genuses on the Streets of Vancouver').mark_circle(size =200).encode(
alt.X('on_street',title =None),
alt.Y('genus_name', title = None),
alt.Color('genus_count', scale = alt.Scale(scheme = 'yelloworangered'), legend=None),
alt.Tooltip('genus_count'))
genus_street_fall_plot
genus_street_spring_plot = alt.Chart(genus_street_spring_fil, title= 'Spring Genuses on the Streets of Vancouver').mark_circle(size =200).encode(
alt.X('on_street',title =None),
alt.Y('genus_name', title = None),
alt.Color('genus_count',scale = alt.Scale(scheme ='redpurple'), legend=None),
alt.Tooltip('genus_count'))
genus_street_spring_plot
So, In general going to these streets on the plots would give one the most of fall and spring colors. according to the data the top 5 streets for the fall colors are W 6TH AV, W 11TH AV, W 15TH AV, KINGSWAY and ANGUS DRIVE. And similarly from the given dataset the top 5 streets were spring bloom observed are W 59TH AV, DUMFRIES ST, W 22ND AV, RUPERT ST and DUMFRIES ST.
The genus count on street doesn't give the full picture, since they may not be even on the same block. So, grouping the data to include the block information would help us to get a better idea of the exact location where we would be able to find the trees.
genus_street_sb = tree_data_spring.groupby(['neighbourhood_name','genus_name','on_street','on_street_block']).size()
genus_street_block_spring = genus_street_sb.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_block_spring_fil = genus_street_block_spring[genus_street_block_spring['genus_count']>8]
genus_street_block_spring_fil.head()
| neighbourhood_name | genus_name | on_street | on_street_block | genus_count | |
|---|---|---|---|---|---|
| 0 | Killarney | PRUNUS | BUTLER ST | 7700 | 17 |
| 1 | Kensington-Cedar Cottage | PRUNUS | E 20TH AV | 1400 | 13 |
| 2 | Killarney | MAGNOLIA | SPARBROOK CRESCENT | 7700 | 13 |
| 3 | Victoria-Fraserview | PRUNUS | HARRISON DRIVE | 2300 | 12 |
| 4 | West Point Grey | PRUNUS | W 10TH AV | 4400 | 12 |
genus_street_fb = tree_data_fall.groupby(['neighbourhood_name','on_street','genus_name','on_street_block']).size()
genus_street_block_fall = genus_street_fb.to_frame(name = 'genus_count').reset_index().sort_values('genus_count',ascending=False).reset_index(drop=True)
genus_street_block_fall_fil = genus_street_block_fall[genus_street_block_fall['genus_count']>8]
genus_street_block_fall_fil.head()
| neighbourhood_name | on_street | genus_name | on_street_block | genus_count | |
|---|---|---|---|---|---|
| 0 | Killarney | BUTLER ST | PRUNUS | 7700 | 17 |
| 1 | Mount Pleasant | ATHLETES WAY | ACER | 100 | 15 |
| 2 | South Cambie | W 22ND AV | ACER | 900 | 13 |
| 3 | Kensington-Cedar Cottage | E 20TH AV | PRUNUS | 1400 | 13 |
| 4 | Killarney | SPARBROOK CRESCENT | MAGNOLIA | 7700 | 13 |
From the above data it can be seen that the top 5 spots to see the fall colors would be in 7700th butler st, 100th athletes way, 1400th E 20th Av, 7700th Sparbrook crescent and 3500 W 30th Av. Similarly, the spring bloom can be observed on the 7700th butler st, 1400th E 20TH Av, 7700th SPARBROOK CRESCENT, 2300th HARRISON DRIVE and 4400 W 10TH Av.
genus_street_block_spring_plot = alt.Chart(genus_street_block_spring_fil).mark_rect().encode(
alt.X('on_street',title = None),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_count',legend=None,title=None, scale = alt.Scale(scheme = 'redpurple')),
alt.Tooltip(['genus_name','genus_count']))
genus_street_block_spring_plot
genus_street_block_fall_plot = alt.Chart(genus_street_block_fall_fil).mark_rect().encode(
alt.X('on_street',title = None),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_count',legend=None,title=None, scale = alt.Scale(scheme = 'yelloworangered')),
alt.Tooltip(['genus_name','genus_count'])).properties(height=450,width=900)
genus_street_block_fall_plot
The following plot would use click selection to be used for the individual genus distribution for all the neighbourhoods in fall and spring
select_neighbourhood = alt.selection_single(fields=['neighbourhood_name'],on='mouseover',clear='mouseout', bind='legend')
neighbourhood_tree_plot_fall = alt.Chart(data_geojson_remote).mark_geoshape(
stroke = 'black', strokeWidth = 0.15).encode(
color = alt.Color('genus_count:Q',scale = alt.Scale(scheme = 'yelloworangered'),title = None,legend=None),
opacity=alt.condition(select_neighbourhood,alt.value(1), alt.value(0.1)),
tooltip =[alt.Tooltip('neighbourhood_name:N', title = 'Neighbourhood Name'),
alt.Tooltip('genus_count:Q',title = 'No of Genus')]
).transform_lookup(lookup = 'properties.name',from_ = alt.LookupData(
tree_data_fall_neigh,'neighbourhood_name',['neighbourhood_name','genus_count'])).project(
type = 'identity', reflectY =True).add_selection(select_neighbourhood).properties(title ='Fall Genus Distribution Map')
vancouver_fall_map = vancouver_map + neighbourhood_tree_plot_fall
genus_count_scatter_fall = alt.Chart(tree_data_fall).mark_circle(size=70,stroke='black').encode(
alt.X('genus_name',title = 'Genus Name',sort='-y'),
alt.Y('count()',title= ' Genus Count'),
alt.Color('neighbourhood_name',title = 'Neighbourhood Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0)),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('count(genus_name)',title ='Genus Count')]).add_selection(select_neighbourhood).properties(width=900,height =400)
vancouver_fall_map & genus_count_scatter_fall
select_neighbourhood = alt.selection_single(fields=['neighbourhood_name'],on='mouseover',clear='mouseout', bind='legend')
neighbourhood_tree_plot_spring = alt.Chart(data_geojson_remote).mark_geoshape(
stroke = 'black', strokeWidth = 0.15).encode(
color = alt.Color('genus_count:Q',scale = alt.Scale(scheme ='redpurple'),title = None,legend=None),
opacity=alt.condition(select_neighbourhood,alt.value(1), alt.value(0.1)),
tooltip =[alt.Tooltip('neighbourhood_name:N', title = 'Neighbourhood Name'),
alt.Tooltip('genus_count:Q',title = 'No of Genus')]
).transform_lookup(lookup = 'properties.name',from_ = alt.LookupData(
tree_data_spring_neigh,'neighbourhood_name',['neighbourhood_name','genus_count'])).project(
type = 'identity', reflectY =True).add_selection(select_neighbourhood).properties(title ='Spring Genus Distribution Map')
vancouver_spring_map = vancouver_map + neighbourhood_tree_plot_spring
genus_count_scatter_spring = alt.Chart(tree_data_spring).mark_circle(size=70,stroke='black').encode(
alt.X('genus_name',title = 'Genus Name',sort='-y'),
alt.Y('count()',title= ' Genus Count'),
alt.Color('neighbourhood_name',title = 'Neighbourhood Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0)),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('count(genus_name)',title ='Genus Count')]).add_selection(select_neighbourhood).properties(width=900,height =300)
vancouver_spring_map & genus_count_scatter_spring
common_genus_scatter_spring = alt.Chart(tree_common_spring).mark_circle(size=100,stroke='black').encode(
alt.X('neighbourhood_name',title = None),
alt.Y('count()', title =None,stack=None),
alt.Color('genus_name',title='Genus Name'),
opacity=alt.condition(select_neighbourhood, alt.value(0.7), alt.value(0))).add_selection(select_neighbourhood)
common_genus_scatter_spring
(genus_count_scatter_spring & (vancouver_spring_map | common_genus_scatter_spring))
common_genus_scatter_fall = alt.Chart(tree_common_fall).mark_circle(size=100,stroke='black').encode(
alt.X('neighbourhood_name',title = 'Neighbourhood Name'),
alt.Y('count()', title =None),
alt.Color('genus_name',title='Genus Name'))
common_genus_scatter_fall
genus_street_plot = (genus_street_spring_plot.encode(opacity=alt.condition(select_neigh,alt.value(0.9),alt.value(0)))
& genus_street_fall_plot.encode(opacity=alt.condition(select_neigh,alt.value(0.9),alt.value(0)))).resolve_scale(color='independent')
genus_street_plot.add_selection(select_neigh)
From the above plot the most number of genuses per street distribution is observed. Tre dropdown menu is giving us the provision to check if the street belongs to the neighbourhood we are interested. A tooltip option is provided to help us keep track of the genus count.
Now let's explore the exact location where we could see the group of trees to get the best view. The filtered dataframe for both spring and fall are plotted as shown.Eventhough the street block is given as an int value, we need to specify it as ordinal data, as it is the street block number, not a continuos value.
slider_count_fall = alt.binding_range(name = 'Fall Genus Count',
step = 1,
min = min(genus_street_block_fall_fil['genus_count']),
max = max(genus_street_block_fall_fil['genus_count']))
radio_genus_fall = alt.binding_radio(name = 'Common Fall Genus', options = common_genus_fall )
radio_slider_select_fall = alt.selection_single(fields = ['genus_count','genus_name'],
bind = {'genus_count':slider_count_fall,'genus_name':radio_genus_fall})
street_block_plot_fall = alt.Chart(genus_street_block_fall_fil).mark_circle(size=70,stroke='black').encode(
alt.X('on_street',title = None),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'yelloworangered'),legend=None)).properties(height=350,width=800)
street_block_plot_fall
nearest = alt.selection_single(nearest=True, on='mouseover',
fields=['on_street'], clear='mouseout')
slider_count_spring = alt.binding_range(name = 'Spring Genus Count',
step = 1,
min = min(genus_street_block_spring_fil['genus_count']),
max = max(genus_street_block_spring_fil['genus_count']))
radio_genus_spring = alt.binding_radio(name = 'Common Spring Genus', options = common_genus_spring )
radio_slider_select_spring = alt.selection_single(fields = ['genus_count','genus_name'],
bind = {'genus_count':slider_count_spring, 'genus_name':radio_genus_spring})
street_block_plot_spring = alt.Chart(genus_street_block_spring_fil).mark_circle(size=70,stroke='black').encode(
alt.X('on_street',title = None,axis=alt.Axis(grid = False)),
alt.Y('on_street_block:O',title=None),
alt.Color('genus_name', scale = alt.Scale(scheme = 'redpurple'), legend=None),
tooltip = [alt.Tooltip('neighbourhood_name',title='Neighbourhood Name'),
alt.Tooltip('on_street_block',title='Block Number'),
alt.Tooltip('on_street', title ='Street Name')]).properties(height=350,width=500)
street_block_plot_spring
a tooltip is provided to find out the exact location where the desired colors are found.
slider_count = alt.binding_range(name = 'Genus Count',
step = 1,
min = min(genus_street_block_fall_fil['genus_count']),
max = max(genus_street_block_fall_fil['genus_count']))
common_genus_fall_spring = list(set(common_genus_fall + common_genus_spring))
radio_genus = alt.binding_radio(name = 'Common Genus', options = common_genus_fall_spring)
radio_slider_select = alt.selection_single(fields = ['genus_count','genus_name'],
bind = {'genus_count':slider_count,'genus_name':radio_genus})
street_block_plot_combined = (street_block_plot_fall.encode(opacity = alt.condition(radio_slider_select, alt.value(0.9), alt.value(0.1)))
& street_block_plot_spring.encode(opacity = alt.condition(radio_slider_select, alt.value(0.9), alt.value(0.1)))
).add_selection(radio_slider_select)
street_block_plot_combined
The EDA was helpful in exploring the data and it helped in obtaining different visualizations. It helped me filter out which all plots are useful to answer the questions I have put forward. To answer question 1, I realized the map along with the point plot would be helpful. to answer question 2, different plots were tried out and the mark_circle plot was chosen to be included in the final_report. To answer the third question, the mark_circle plot is done. Since finding the genus location is our aim, the street name is plotted against block number, giving tooltip to each point, and widgets given in this plot helps in selection of genus name and genus count can be filtered using the slider.